Normalizing uniform resource locators

ABSTRACT

The subject matter described in this disclosure can be embodied in methods and systems for receiving, by a computing system, a request to normalize an input Uniform Resource Locator (URL). The computing system identifies a user account to which the request to normalize the input URL relates. The computing system determines a selected set of URL normalization rules that are identified as being activated for the user account from among a larger collection of URL normalization rules. The computing system normalizes the input URL using the selected set of URL normalization rules to generate a normalized URL. The computing system stores information that results from an analysis of the input URL in association with an indication of the normalized URL.

REFERENCE TO CO-PENDING APPLICATION

This application claims the benefit of priority under 35 U.S.C. § 119(e)to U.S. Provisional Patent Application No. 63/277,957, filed Nov. 10,2021 and entitled NORMALIZING UNIFORM RESOURCE LOCATORS, and the entiredisclosure set forth therein is incorporated herein by reference.

TECHNICAL FIELD

This document generally relates to technologies that normalize uniformresource locators.

BACKGROUND

Web pages are typically accessible via the internet using correspondinguniform resource locators (URLs). URLs can be formed of differentportions. For example, the URL“https://www.example.com:443/category?queryterm=parameter#fragment”includes a scheme (https://), a subdomain (www), a domain (example.com),a port number (:443), a path (/category), a query string separator (?),a query string (queryterm=parameter), and a fragment (#fragment).Different variations of a URL can reference the same web page.

SUMMARY

This document describes techniques, methods, systems, and othermechanisms for normalizing URLs.

Particular implementations of the technology described in this documentcan, in certain instances, realize one or more of the followingadvantages. A computing system that includes multiple services thatstore information in association with indications of normalized URLs canaccess a single system that uniformly normalizes URLs. As such, thecomputing system can use a single webpage identifier to accessinformation concerning the same webpage across multiple datasets.

Another advantage is that the URL normalization rules used to normalizea given URL can be user account specific, such that different useraccounts may specify different combinations of URL normalization rulesto be used in forming normalized URLs for these different user accounts.

A user account may specify that a certain set of URL normalization rulesshould be used for a first group of webpages and that a different set ofURL normalization rules should be used for a second group of webpages.As such, normalization rules can be not only user account specific, butalso tailored to specific websites or portions of a website, whichenables customization of URL normalization rules to particular webpages.

Normalized URLs can also be hashed to generate a shorter numericalidentifier that represents the normalized URL, which can eliminate afuture need to access a database in order to find an identifier for anormalized URL.

As additional description to the embodiments described below, thepresent disclosure describes the following embodiments.

Embodiment 1 is a computer-implemented method, comprising: receiving, bya computing system, a request to normalize an input Uniform ResourceLocator (URL); identifying, by the computing system, a user account towhich the request to normalize the input URL relates; determining, bythe computing system, a selected set of URL normalization rules that areidentified as being activated for the user account from among a largercollection of URL normalization rules; normalizing, by the computingsystem, the input URL using the selected set of URL normalization rulesto generate a normalized URL; and storing, by the computing system,information that results from an analysis of the input URL inassociation with an indication of the normalized URL.

Embodiment 2 is the computer-implemented method of embodiment 1,comprising: receiving, by the computing system, a second request tonormalize a second input URL, the second input URL being different fromthe input URL; identifying, by the computing system, that the secondrequest to normalize the second input URL relates to same said useraccount to which the request to normalize the input URL relates;normalizing, by the computing system, the second input URL using theselected set of URL normalization rules to generate same said normalizedURL, based on the selected set of URL normalization rules beingactivated for the user account; and storing, by the computing system,second information that results from an analysis of the second input URLin association with the indication of the normalized URL.

Embodiment 3 is the computer-implemented method of embodiment 2,comprising: receiving, by the computing system, a request forinformation stored in association with the indication of the normalizedURL; and accessing, by the computing system, the information and thesecond information using the indication of the normalized URL to accessthe information and the second information, without the accessing usingthe input URL and without the accessing using the second input URL.

Embodiment 4 is the computer-implemented method of any one ofembodiments 1-3, comprising: providing, by the computing system forreceipt by a computing device at which the user account has logged in,information to cause the computing device to present a user interface toactivate normalization rules, the user interface presenting content thatindicates multiple URL normalization rules from the collection of URLnormalization rules along with corresponding activation interfaceelements that enable user input to selectively activate selected URLnormalization rules of the multiple URL normalization rules that areindicated by the user interface; and receiving, by the computing system,an indication that user input at the computing device interacted withactivation interface elements presented by the user interface toactivate the selected set of URL normalization rules for the useraccount.

Embodiment 5 is the computer-implemented method of embodiment 4,wherein: the multiple URL normalization rules represent the collectionof URL normalization rules, such that the user interface presentscontent that indicates all URL normalization rules of the collection ofURL normalization rules.

Embodiment 6 is the computer-implemented method of embodiment 4,wherein: the user interface presents content that indicates each defaultnormalization rule in a group of default normalization rules, eachdefault normalization rule in the group of default normalization rulesbeing unaccompanied by a corresponding activation interface element,such that each default normalization rule in the group of defaultnormalization rules is applied during URL normalization despite whichnormalization URLS of the collection of URL normalization rules areactivated in association with the user account.

Embodiment 7 is the computer-implemented method of any one ofembodiments 4-6, wherein the user input at the computing device thatinteracted with the activation interface elements to activate theselected set of URL normalization rules includes: a first user inputthat interacted with a first activation interface element to activate afirst URL normalization rule from the collection of URL normalizationrules; and a second user input that interacted with a second activationinterface element to activate a second URL normalization rule from thecollection of URL normalization rules.

Embodiment 8 is the computer-implemented method of embodiment 7,wherein: the first user input that activated the first URL normalizationrule included a first user interaction with the first activation elementwithout entry of user-specified text, the first URL normalization rulenot being customizable by user input through interaction with the userinterface.

Embodiment 9 is the computer-implemented method of embodiment 8,wherein: the second user input that activated the second URLnormalization rule included a second user interaction with the secondactivation element to enter user-specified text, the user-specified textindicating text content in URLs on which the second URL normalizationrule is to operate.

Embodiment 10 is the computer-implemented method of embodiment 9,wherein: the second URL normalization rule is configured to modify theuser-specified text in a manner that is defined by the second URLnormalization rule and not defined by the user-specified text, themanner that the second URL normalization rule is configured to modifythe user-specified text not being customizable by user input throughinteraction with the user interface.

Embodiment 11 is the computer-implemented method of embodiments 1-10,comprising: receiving, by the computing system, a second request tonormalize a second input URL; identifying, by the computing system, asecond user account to which the second request to normalize the secondinput URL relates; determining, by the computing system, a secondselected set of URL normalization rules that are identified as beingactivated for the second user account from among the larger collectionof URL normalization rules, the second selected set of URL rules thatare activated for the second user account being different from theselected set of URL normalization rules that are activated for the useraccount; normalizing, by the computing system, the second input URLusing the second selected set of URL normalization rules to generate asecond normalized URL, the second normalized URL being different fromthe normalized URL; and storing, by the computing system, informationthat results from an analysis of the second input URL in associationwith an indication of the second normalized URL.

Embodiment 12 is the computer-implemented method of any one ofembodiments 1-11, wherein: identifying the user account to which therequest to normalize the input URL relates includes identifying that therequest to normalize the input URL specifies the user account.

Embodiment 13 is the computer-implemented method of any one ofembodiments 1-12, comprising: hashing, by the computing system, thenormalized URL to identify a hashed value that represents the normalizedURL, the indication of the normalized URL comprising the hashed value.

Embodiment 14 is the computer-implemented method of embodiment 13,comprising: receiving, by the computing system, a request forinformation that results from a URL analysis, the request forinformation that results from the URL analysis including the hashedvalue; and accessing, by the computing system using the hashed value andwithout using the input URL responsive to receiving the hashed value inthe request for information that results from the URL analysis, theinformation that results from the analysis of the input URL.

Embodiment 15 is the computer-implemented method of embodiment 14,comprising: accessing, by the computing system using the hashed valueand without using the input URL responsive to receiving the hashed valuein the request for information that results from the URL analysis,information that results from an analysis of a second URL, wherein thenormalized URL represents a normalized version of the second URL whenthe second URL is normalized according to the selected set ofnormalization rules.

Embodiment 16 is the computer-implemented method of any one ofembodiments 1-15, comprising: providing, by the computing system forreceipt by a computing device at which the user account has logged in,information to cause the computing device to present a user interface toactivate normalization rules, the user interface presenting activationinterface elements that enable user input to (i) select a selected groupof URLs from among multiple different groups of URLs, and (ii) select aportion of the collection of URL normalization rules to apply to theselected group of URLs; and receiving, by the computing system, userinput that interacts with the activation interface elements of the userinterface to specify that the selected set of URL normalization rulesare to apply to URLs in a first group of URLs from among the multipledifferent groups of URLs; and identifying, by the computing system, thatthe input URL is part of the first group of URLs, wherein determiningthe selected set of URL normalization rules that are identified as beingactivated for the user account includes identifying that the selectedset of URL normalization rules were specified by user input as applyingto the first group of URLs and that the input URL is part of the firstgroup of URLs.

Embodiment 17 is the computer-implemented method of embodiment 16,comprising: receiving, by the computing system, user input thatinteracts with the activation interface elements presented as part ofthe user interface to specify that a second selected set of URLnormalization rules are to apply to URLs in a second group of URLs fromamong the multiple different groups of URLs, wherein the second group ofURLs is different from the first group of URLs, wherein the secondselected set of URL normalization rules are different from the selectedset of URL normalization rules.

Embodiment 18 is the computer-implemented method of embodiment 17,comprising: receiving, by a computing system, a request to normalize asecond input URL; identifying, by the computing system, that the secondinput URL is part of the second group of URLs; normalizing, by thecomputing system, the second input URL using the second selected set ofURL normalization rules to generate a second normalized URL, wherein thesecond selected set of normalization rules includes a normalization rulethat is not within the selected set of normalization rules; hashing, bythe computing system, the second normalized URL using a combination ofall URL rules from the selected set of URL normalization rules and allURL rules from the second selected set of URL normalization rules,including the normalization rule that is not within the second selectedset of normalization rules and excluding multiple normalization rulesfrom the collection of URL normalization rules that are not activatedfor the user account, to identify a second hashed value that identifiesthe second normalized URL; and storing, by the computing system,information that results from an analysis of the second input URL inassociation with the second hashed value.

Embodiment 19 is a computing system, comprising: one or more processors;and one or more computer-readable devices including instructions that,when executed by the one or more processors, cause the computing systemto perform operations that include: receiving, by the computing system,a request to normalize an input Uniform Resource Locator (URL);identifying, by the computing system, a user account to which therequest to normalize the input URL relates; determining, by thecomputing system, a selected set of URL normalization rules that areidentified as being activated for the user account from among a largercollection of URL normalization rules; normalizing, by the computingsystem, the input URL using the selected set of URL normalization rulesto generate a normalized URL; and storing, by the computing system,information that results from an analysis of the input URL inassociation with an indication of the normalized URL.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Other features,objects, and advantages will be apparent from the description anddrawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 shows a computing system that is configured to normalize URLs.

FIG. 2 shows a user interface that enables user input to activate URLnormalization rules.

FIG. 3 shows a flowchart of operations in which user input selectsnormalization rules.

FIGS. 4A-B show a flowchart of operations in which a URL is normalizedaccording to a selected set of rules.

FIG. 5 shows a flowchart of operations to request information using anindication of a normalized URL.

FIGS. 6A-C show different scenarios in which URLs can be normalized.

FIG. 7 is a conceptual diagram of a system that may be used to implementthe systems and methods described in this document.

FIG. 8 is a block diagram of computing devices that may be used toimplement the systems and methods described in this document, as eithera client or as a server or plurality of servers.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

This document generally describes technologies for normalizing URLs. AURL normalizing system can be configured to normalize URLs using URLnormalization rules. The set of URL normalization rules applied to agiven URL may depend on a user account that requested the URLnormalization, and which URL normalization rules are activated for thatuser account.

A user that logs into the URL normalizing system with a given useraccount may interact with a user interface that presents indications ofmultiple different URL normalization rules. Some rules may be activatedby default and cannot be deactivated by user input, while the userinterface may include interface elements that enable user activation ofoptional rules. User input may activate URL normalization rules for allURLs, or just for a specific to a group of URLs (e.g., a particularwebsite or a portion thereof).

After normalizing a URL, the URL normalizing system may storeinformation regarding a webpage in conjunction with some sort ofindication of the normalized URL (e.g., the normalized URL or a numberthat represents the normalized URL). This indication of the normalizedURL may be used as an identifier to retrieve the information regardingthe webpage. In some examples, indication of the normalized URL is ahash value generated by hashing the normalized URL. In such examples,information regarding a webpage may be stored in conjunction with thehash value. As such, the hash value may be used as an identifier toretrieve information regarding the webpage.

FIG. 1 shows a computing system that is configured to normalize URLs.The computing system includes a URL normalizing system 110, twodifferent types of website analyzers 120 and 130, respective storage fortwo different types of analysis 140 and 170, storage for a collection ofURL normalizing rules 150, and storage for account-specific ruleactivations 160.

The first type of website analyzer 120 and the second type of websiteanalyzer 130 each represent one or more computer programs that analyzecertain characteristics of a website. For example, both types of websiteanalyzers 120 and 130 may be operated by a company that analyzeswebsites and provides results of such analysis to owners of thewebsites, to help the owners improve various characteristics of thewebsites. A first example type of website analysis includes determininghow accessible the website is to users that have accessibility issues(e.g., users that have poor eyesight, hearing, or finger dexterity). Asecond example type of website analysis includes determining how wellthe website ranks on search engines, commonly called SEO (search engineoptimization). A third example type of website analysis includesdetermining a quality of content of the website, such as whether wordsare spelled correctly and links are valid.

The first type of website analyzer 120 may be configured to perform oneof these types of website analyses (e.g., an accessibility analysis),while the second type of website analyzer 130 may be configured toperform a second of these types of website analyses (e.g., an SEOanalysis). Each of the website analyzers 120 and 130 may analyzewebsites themselves, request analyses from one or more third-partyservices, or perform a combination of both.

Each website analysis may identify multiple webpages, and multipleissues that occur across the webpages (e.g., misspellings, broken links,accessibility issues on various pages). An identity of each page may beoriginally specified by a URL. A difficulty is that many webpages can berepresented by multiple different URL variations. For example, the firsttype of website analyzer 120 may receive an analysis from a third-partyanalysis service, and a URL of a webpage specified in the analysis mayinclude query parameters that the third-party analysis appended to theURL to track requests to the URL. The first type of website analyzer 120may perform its own analysis of the same webpage, but the URL recordedfor such an analysis may not include the same query parameters. As such,two different analyses for the same webpage may be stored in associationwith different URLs. Similarly, a second type of analysis of the samewebpage by the second type of website analyzer 130 may be associatedwith yet a different URL.

The URL normalizing system 110 is configured to receive input URLs fromthe website analyzers 120 and 130, and return normalized URLs. Thewebsite analyzers 120 and 130 may then store analyses of webpages inassociation with (1) the normalized URLs, or (2) indications of thenormalized URLs (e.g., numerical identifiers assigned to the normalizedURLs). As such, multiple analyses of the same webpage that wereinitially associated across a computing system with different URLs maybe stored in association with a single indication of a normalized URLfor the webpage.

The URL normalizing system 110 can receive an input URL in a URLnormalization request. The request can specify a user account to whichthe request relates. For example, an accessibility analysis of aparticular website may have been requested by an owner of the particularwebsite. In preparing the analysis, the service that is preparing theaccessibility analysis (e.g., the website analyzer 120) may request thatURLs identified in the analysis be normalized by the URL normalizingsystem 110. In such an example, the URL normalization request is beingperformed on data prepared for a particular user account, and the URLnormalization request may therefore specify the particular user account.

The URL normalizing system 110 may have access to a collection of URLnormalizing rules 150. The collection of URL normalizing rules mayinclude some rules that are designated as default rules that apply toall URL normalizations, and some rules that are designated as optionalrules that apply to a URL normalization only if activated for a certainuser account to which a normalization request relates. Theaccount-specific rule activations 160 stores information that indicateswhich optional rules from the collection of normalizing rules 150 areactivated for certain user accounts.

The URL normalizing system 110 includes a rule selector 112 thatidentifies the user account to which a URL normalization requestrelates, and accesses the collection of URL normalizing rules 150 andthe account-specific rule activations 160 to select a set of default andoptional URL normalization rules that apply to the URL normalizationrequest. A normalizer 114 applies the selected set of URL normalizationrules to an input URL to generate a normalized URL.

In some examples this normalized URL is sent back to the requestingservice (e.g., the website analyzer 120). In some examples, a hasher 116performs a hash on the normalized URL to generate a numerical value thatmay be used to represent the normalized URL, and this hashed numericalvalue is sent back to the requesting service. In some examples, both thenormalized URL and the hashed numerical value are sent back to therequesting service.

A website analyzer (e.g., the first type of website analyzer 120) maystore webpage analysis information in association with an indication ofa normalized URL. An indication of a normalized URL may be thenormalized URL itself, a hashed numerical identifier generated byhashing the normalized URL, or an artificial value stored in associationwith the normalized URL in a data storage system (e.g., a “7” toidentify the 7th URL that is normalized). As such, multiple analysesthat represent the same webpage but were generated from different URLsmay be stored in association with the same identifier.

In FIG. 1 , the storage 140 as illustrated as including four differentcollections of analyses information for four different normalized URLs.The designation of a normalized URL in the figure (e.g., “norm_URL#1”represents an indication of a normalized URL, such the normalized URL ora hash value generated therefrom). The designation of analysisinformation in the figure (e.g., “analysis_info1”) can representanalysis information generated as a result of multiple differentanalyses of a web page.

A website analyzer can also request previously-stored analysisinformation, as illustrated in FIG. 1 with the second type of websiteanalyzer 130 requesting from storage 170 an analysis stored inassociation with an indication of a normalized URL. As illustrated inFIG. 1 , both storages 140 and 170 can store different analyses for thesame identifier. Responsive to a request for analysis information, thestorage 170 can return such analyses information from both storages 140and 170. Both of storages 140 and 170 may store analyses information andrespond to requests for analysis information.

FIG. 2 shows a user interface that enables user input to activate URLnormalization rules. The user interface includes a list of default rules210, a list of optional rules 220, and a list of optional rewrite rules230. These lists represent different categories of rules that can applyto a URL normalization process, in which an input URL is transformed byan application of multiple rules into a normalized URL. The userinterface shown in FIG. 2 may be that shown at a computing device thathas logged into the URL normalizing system 110 (FIG. 1 ) with a certainuser account. In this illustration, the user interface shows that theuser account “CompanyA” has logged into the URL normalizing system 110,and the FIG. 2 user interface shows URL normalization settings for thatuser account (e.g., user interface content that shows which rules areactivated).

The URL group selector 240 is an interface element that enables userinput to select a particular group of URLs to which URL normalizationsettings may apply. In the FIG. 2 illustration, the URL group selector240 lists a catch-all URL group “(All URLs)” and four specific groups ofURLs: http://websiteA.com/catalog/*, http://websiteA.com/blog/*,http://websiteB.com/page/*, and http://websiteB.com/*. As indicated by atop portion of the pull-down menu which comprises the URL group selector240, user input has currently selected the URL grouphttp://websiteA.com/catalog/*. This selection means that thecurrently-displayed URL activations are specific to the selected URLgroup. Changing the selected URL group may cause the user interface toshow the same three lists of URL activation rules 210, 220, and 230, butwith different specific rules activated.

The “(All URLs)” URL group represents all URLs. As such, rules activatedfor the “all URLs” setting can apply to all pages. In someimplementations, this includes pages that fall into other URL groups. Insome implementations, this includes only those pages do not fall intothe other URL groups.

The “http://websiteA.com/catalog/*” URL group represents all webpageswith a base path of “http://websiteA.com/catalog/” (the “*” symbolrepresents a wildcard character).

The “http://websiteA.com/blog/*” URL group represents all webpages witha base path of “http://websiteA.com/blog/”. Since the URL group selector240 does not show a URL group for “http://websiteA.com/*”, all webpagesfor the websiteA.com website that do not belong to the “catalog” path orthe “blog” path may be governed by the “(All URLs)” rule.

The “http://websiteB.com/page/*” URL group represents all webpages witha base path of “http:/website.com/page/”.

The “http://websiteB.com/*” URL group represents all webpages from thecompanyB.com webpage. In some implementations, this URL group excludespages that belong to the “page” path, since there is a separate URLgroup for that specific portion of the webpage.

Although not shown in FIG. 2 , the user interface may enable anindividual to add and remove URL groups, for example, by entering textinto a text field to specify at least part of a URL that defines the URLgroup.

The list of default rules 210 lists six URL normalization rules,designated A1-A6. These six rules may be default rules that the URLnormalizing system 110 applies to all input URLs, despite the useraccount for which a URL normalization is being performed and despitewhether the input URL belongs to any particular URL group.

Application of Rule A1 includes changing any characters in a domain nameportion of an input URL from upper case to lower case.

Application of Rule A2 includes removing a default port from an inputURL, for example removing the characters “80” for http and “443” forhttps.

Application of Rule A3 includes percent encoding the path portion of aninput URL, for example, encoding all non-ASCII characters in the pathportion with percent encoding (e.g., using UTF-8).

Application of Rule A4 includes percent encoding query parameters in aninput URL, for example, encoding all non-ASCII characters in any queryparameters in an input URL with percent encoding (e.g., using UTF-8).

Application of Rule A5 includes normalizing spaces in any queryparameters in an input URL.

Application of Rule A6 includes normalizing the case of percent encodingin an input URL, for example, changing “%2F” to “%2f”.

The list of optional rules 220 lists five URL normalization rules. Thelisting 220 includes, for each rule, a text portion 270 and anactivation interface element 260. The text portion 270 includes text todescribe the corresponding rule, and the activation interface element260 is user selectable to toggle a rule between a deactivated status andan activated status. Any activation may apply to only (1) acurrently-logged in user account 250, and (2) a currently-selected URLgroup that is currently selected using the URL group selector 240.

Application of Rule B1 includes removing any trailing slashes in a URL(e.g., changing “http://example.com/folder/” to“http://example.com/folder”).

Application of Rule B2 includes ordering query parameters. For example,should an input URL include multiple query parameters, the computingsystem can order them in alphabetical order by parameter name. Shouldtwo parameters have the same name, the system may order the parametersby value.

Application of Rule B3 includes changing “http://” in an input URL to“https://”.

Application of Rule B4 includes lowercasing any uppercase characters inthe path portion of an input URL.

Application of Rule B5 includes removing the fragment of any input URL.For example, the computing system may change the URL“http://example.com/folder#section” to http://example.com/folder”.

The list of optional rewrite rules 230 lists four URL normalizationrules. The listing 230 includes, for each rule, a text portion and acorresponding activation interface element. The optional rewrite rules230 differ from the optional rules 220, because the optional rewriterules 230 require user entry of text that defines part of an applicationof the corresponding rule. Optional rules 220 operate in a definedmanner, and user interaction with the listing of optional rules 220 mayonly be able activate or deactivate application of the optional rules220 (not changing how the rules are executed). In contrast, responsiveto user input activating an optional rewrite rule, for example asillustrated in FIG. 2 with Rule C2, the user interface presents a textinput field 280 that enables user input to specify text that is used aspart of the operation of the corresponding rule.

Application of Rule C1 includes removing a trailing path component thatmatches a user-specified string. For example, should user input specifythe text “index.html” in a text field that appears upon user-activationof Rule C1, application of the rule may change the URL“http://example.com/index.html” to “http://example.com/”.

Application of Rule C2 includes removing a subdomain matching auser-specified string. For example, should user input specify the text“www” in a text field that appears upon user-activation of Rule C2,application of the rule may change the URL “http://www.example.com” to“http://example.com”.

Application of Rule C3 includes removing query parameters from an inputURL that match a user-specified string or regex. For example, shoulduser input specify the string “orderid” in a text field that appearsupon user-activation of Rule C3, application of the rule may change theURL “http://example.com/?orderid=72” to “http://example.com/”.

Application of Rule C4 includes renaming a subdomain that matches auser-specified string within an input URL. For example, should userinput specify the strings “serverA” and “serverB” in text fields thatappear upon user-activation of Rule C4, application of the rule maychange the URL “http://serverA.example.com/” to“http://serverB.example.com”.

FIG. 3 shows a flowchart of operations in which user input selectsnormalization rules. The operations described with respect to the FIG. 3flowchart may be performed by the URL normalizing system 110 to generatea presentation of the FIG. 2 user interface on a user computing device.

At box 310, the computing system provides information to a computingdevice to cause the computing device to present a user interface toactivate normalization rules. For example, the URL normalizing system110 may provide files to a user computing device to cause a web browserof the user computing device to present the user interface shown in FIG.2 . The user computing device may be operated by a company that operatesthe URL normalizing system 110 or a customer of that company. Theinformation sent to the computing device can include a web page HTMLfile.

At box 320, the user interface presents content that indicatesselectable URL normalization rules. For example, the FIG. 2 userinterface includes text that describes each of the rules B1-5 and C1-4,all of which are user-selectable.

At box 330, the user interface presents corresponding activationelements. For example, each of the rules B1-5 and C1-4 is accompanied byan interface element that is user-selectable to cause activation of thecorresponding rule (e.g., changing the text “Activate” to “Activated”and changing the interface element from having no shading to beingshaded).

At box 340, the user interface presents content that indicates defaultnormalization rules. For example, the FIG. 2 user interface includestext that describes each of the rules A1-6.

At box 350, the user interface includes an interface element thatenables user input to select a group of URLs from among multiple groupsof URLs. For example, the URL group selector 240 enables user input toselect any of five illustrated groups of URLs, at least some of whichmay have been specified by user input provided by a device logged intothe “CompanyA” user account.

At box 360, the computing system receives an indication that user inputat the computing device interacted with activation interface elements toactivate a selected set of URL normalization rules. For example, userinput at a client computing device may have selected some of the rulesB1-5 and C1-4 through user selection of corresponding ones of theactivation interface elements 260.

At box 370, user input interacted with an activation element withoutentry of user-specified text. For example, user input selected anactivation interface element for one of the rules B1-6. Such anactivation simply toggles the corresponding rule from “off” to “on”, anddoes not involve user entry of text to define how the rule executes.

At box 380, user input interacted with an activation interface elementto enter user-specified text. For example, user input may enter textinto the text field 280 (a portion of activation interface element), tocause the rule C2 to only apply to input URLs that include the text thatuser input specified into the text field 380.

FIGS. 4A-B show a flowchart of operations in which a URL is normalizedaccording to a selected set of rules. The operations described withrespect to the FIGS. 4A-B flowchart may be performed by the URLnormalizing system 110 to normalize a URL.

At box 410, a computing system receives a request to normalize an inputURL. FIG. 1 illustrates two examples of such a request with the arrowsextending from the website analyzers 120 and 130 to the URL normalizingsystem 110. The request may specify the input URL, which may be a URLwith text that has not been normalized.

At box 420, the computing system identifies a user account to which therequest to normalize the URL relates. The user account may be a useraccount for which the input URL is stored. For example, the user accountmay be with a website analysis system, and may have been created by anowner of a particular website. The owner of the particular website mayrequest that the website analysis system analyze the particular website.The analysis may involve storage of analysis information in conjunctionwith various URLs that represent pages from the particular website(e.g., with analysis information indicating spelling errors and URLs ofthe pages on which those spelling errors occur).

At box 422, the computing system may identify the user accountcorresponding to the request to normalize the input URL, due to therequest specifying the user account (e.g., the request including theuser-account-identifying text “CompanyA”).

At box 430, the computing system may determine a selected set of URLnormalization rules that are activated for the user account. Forexample, the rule selector 112 may use an indication of the user accountto access a portion of the account-specific rule activations storage 160that indicates which rules have been activated for the user account(e.g., based on settings specified through user interaction with theFIG. 2 user interface). The rule selector 112 may then retrieve theactual rules that have been activated from the collection of URLnormalizing rules storage 150. The activated rules may include all rulesfrom the default rules listing 210, and one or more rules from each ofthe optional rules listing 220 and the optional rewrite rules listing230.

At box 432, the computing system identifies that the input URL is partof a group of URLs for which URL normalization rules have beenspecified. For example, the rule selector 112 may determine that theinput URL to be normalized is “http://websiteA.com/catalog/product7”,and therefore that the input URL falls within the URL group“http://websiteA.com/catalog/*”.

At box 434, the computing system identifies URL normalization rules thatare activated for the group of URLs. For example, the FIG. 2 userinterface shows that, for the URL group “http://websiteA.com/catalog/*”,the activated rules include the default rules A1-6, the optional ruleB2, and the optional rewrite rule C2. In some examples, the computingsystem only identifies rules that are activated for a user account, andURL groups with different sets of activation rules are not configuredfor a particular user account.

At box 440, the computing system normalizes the input URL using theselected set of URL normalization rules, to generate a normalized URL.For example, the normalizer 114 may apply each of rules A1-6, B2, andC2, one after another, to convert the input URL to a normalized URL.

At box 450, the computing system may hash the normalized URL to identifya hashed value. For example, the hasher 116 may input the normalized URLinto a hash function to generate a 128-bit numerical hash value.

At box 452, the hashed value may be used as an indication of thenormalized URL that is associated with stored website analysisinformation. In other embodiments, the computing system assigns anartificial identifier to the normalized URL (e.g., a first-evernormalized URL is assigned an identifier of “00001”, a second-evernormalized URL is assigned an identifier of “00002,” and so forth).Using a hashed value instead of an artificial identifier can eliminate aneed to maintain a database of normalized URLs and their correspondingidentifiers, and can eliminate a need to perform database lookups totransform normalized URLs into numerical identifiers.

At box 454, in some implementations, the hashed value is generated basedon a combination of all rules activated for the user account. Forexample, the URL normalization rules that are used to normalize a URLbefore being hashed may include a union of all rules that have beenactivated for a user account (e.g., all activations across all URLgroups for the user account). The normalized URL that is used as aninput to a hash function may therefore have been normalized with morerules than the normalized URL that is presented to a user for a giveninput URL.

At box 460, the computing system stores information that results from ananalysis of the input URL in association with the indication of thenormalized URL. The indication of the normalized URL may be thenormalized URL itself, the above-described hash value generated from thenormalized URL, and/or the above-described artificial identifier.

FIG. 5 shows a flowchart of operations to request information using anindication of a normalized URL. The operations described with respect tothe FIG. 5 flowchart may be performed by either of the website analyzers120 and 130. In FIG. 1 , the request is illustrated as being performedby the second type of website analyzer 130.

At box 510, the computing system receives a request for informationstored in association with the indication of the normalized URL. Forexample, the website analyzer 130 may send a request to the storage 170for information. The request may specify an indication of the normalizedURL. For example, the request may include a hash value that wasgenerated from a normalized URL (box 520).

The request by the website analyzer 130 may be in response to a requestthat it had received from a user device, where the request from the userdevice may have included a URL. The website analyzer 130 may have sentthe URL to the URL normalizing system 110 and received back a hash valueof a normalized version of the URL. It is this hash value that thewebsite analyzer 130 may then send to storage 170 in order to fetchinformation stored for the normalized version of the URL.

At box 530, the computing system accesses the information stored inassociation with the indication of the normalized URL. For example, thewebsite analyzer 130 uses the hash value (or the normalized URL, or anartificial identifier) in a database lookup to access analysesinformation stored in association with the hash value. The analysesinformation may include information that indicates quality issues with awebpage represented by the normalized URL (e.g., issues with the webpagesuch as spelling errors, broken links, accessibility issues). Accessingsuch information may transfer such information from the storage 170 tomemory of the website analyzer 130.

At box 540, in some examples the normalized URL is a normalized versionof two different input URLs. For example, two different URLs may includedifferent subdomains before the URLs are sent to the URL normalizingsystem 110, but URL normalization rules applied to the two URLs mayconvert both URLs to the same underlying normalized URL.

At box 560, the accessed information includes information stored fromanalyses performed in relation to the two different URLs. For example,the request by the website analyzer 130 for information stored inassociation with “norm_URL#1” may return “analysis_info1” from storage140 and “analysis_infoA” from storage 170. These different analyses mayhave been conducted by or received at different website analyzers (e.g.,the first type of website analyzer 120, which may perform or request anaccessibility analysis, and the second type of website analyzer 130,which may perform or request a quality analysis).

As such, the overall computing system (e.g., the collection of computingcomponents shown in FIG. 1 ) may perform and receive many differenttypes of analyses that use inconsistent variations of URLs for differentwebpages, and the system stores such analysis information that arrivesfrom different sources in association with an indication of a singlenormalized URL.

This example, of multiple different URLs being variations of each other(representing the same underlying webpage) that are normalized to a samenormalized URL, is illustrated in FIG. 6A.

This is in distinction to a potentially more common occurrence, in whichtwo different URLs that are not variations of each other (e.g., theyrelate to two completely different webpages from different portions of awebsite) result in two completely different normalized URLs, despiteusing the same set of normalization rules, as illustrated in FIG. 6B.

Another possible occurrence is that the same input URL can be normalizedto different normalized URLs, based on the input URL going throughdifferent sets of normalization rules, as illustrated in FIG. 6C. Such asituation may occur where two different user accounts request a qualityanalysis of a same webpage. In such a situation, the system may receivesimilar/same analysis information regarding the webpage from athird-party system, but the user accounts with the computing system mayhave different sets of normalization rules activated. As such, thecomputing system may store the received webpage analysis information inassociation with different normalized URLs for the different useraccounts.

Referring now to FIG. 7 , a conceptual diagram of a system that may beused to implement the systems and methods described in this document isillustrated. In the system, mobile computing device 710 can wirelesslycommunicate with base station 740, which can provide the mobilecomputing device wireless access to numerous hosted services 760 througha network 750.

In this illustration, the mobile computing device 710 is depicted as ahandheld mobile telephone (e.g., a smartphone, or an applicationtelephone) that includes a touchscreen display device 712 for presentingcontent to a user of the mobile computing device 710 and receivingtouch-based user inputs and/or presence-sensitive user input (e.g., asdetected over a surface of the computing device using radar detectorsmounted in the mobile computing device 510). Other visual, tactile, andauditory output components may also be provided (e.g., LED lights, avibrating mechanism for tactile output, or a speaker for providingtonal, voice-generated, or recorded output), as may various differentinput components (e.g., keyboard 714, physical buttons, trackballs,accelerometers, gyroscopes, and magnetometers).

Example visual output mechanism in the form of display device 712 maytake the form of a display with resistive or capacitive touchcapabilities. The display device may be for displaying video, graphics,images, and text, and for coordinating user touch input locations withthe location of displayed information so that the device 710 canassociate user contact at a location of a displayed item with the item.The mobile computing device 710 may also take alternative forms,including as a laptop computer, a tablet or slate computer, a personaldigital assistant, an embedded system (e.g., a car navigation system), adesktop personal computer, or a computerized workstation.

An example mechanism for receiving user-input includes keyboard 714,which may be a full qwerty keyboard or a traditional keypad thatincludes keys for the digits ‘0-9’, ‘*’, and ‘#.’ The keyboard 714receives input when a user physically contacts or depresses a keyboardkey. User manipulation of a trackball 716 or interaction with a trackpad enables the user to supply directional and rate of movementinformation to the mobile computing device 710 (e.g., to manipulate aposition of a cursor on the display device 712).

The mobile computing device 710 may be able to determine a position ofphysical contact with the touchscreen display device 712 (e.g., aposition of contact by a finger or a stylus). Using the touchscreen 712,various “virtual” input mechanisms may be produced, where a userinteracts with a graphical user interface element depicted on thetouchscreen 712 by contacting the graphical user interface element. Anexample of a “virtual” input mechanism is a “software keyboard,” where akeyboard is displayed on the touchscreen and a user selects keys bypressing a region of the touchscreen 712 that corresponds to each key.

The mobile computing device 710 may include mechanical or touchsensitive buttons 718 a-d. Additionally, the mobile computing device mayinclude buttons for adjusting volume output by the one or more speakers720, and a button for turning the mobile computing device on or off. Amicrophone 722 allows the mobile computing device 710 to convert audiblesounds into an electrical signal that may be digitally encoded andstored in computer-readable memory, or transmitted to another computingdevice. The mobile computing device 710 may also include a digitalcompass, an accelerometer, proximity sensors, and ambient light sensors.

An operating system may provide an interface between the mobilecomputing device's hardware (e.g., the input/output mechanisms and aprocessor executing instructions retrieved from computer-readablemedium) and software.

Example operating systems include ANDROID, CHROME, IOS, MAC OS X,WINDOWS 7, WINDOWS PHONE 7, SYMBIAN, BLACKBERRY, WEBOS, a variety ofUNIX operating systems; or a proprietary operating system forcomputerized devices. The operating system may provide a platform forthe execution of application programs that facilitate interactionbetween the computing device and a user.

The mobile computing device 710 may present a graphical user interfacewith the touchscreen 712. A graphical user interface is a collection ofone or more graphical interface elements and may be static (e.g., thedisplay appears to remain the same over a period of time), or may bedynamic (e.g., the graphical user interface includes graphical interfaceelements that animate without user input).

A graphical interface element may be text, lines, shapes, images, orcombinations thereof. For example, a graphical interface element may bean icon that is displayed on the desktop and the icon's associated text.In some examples, a graphical interface element is selectable withuser-input. For example, a user may select a graphical interface elementby pressing a region of the touchscreen that corresponds to a display ofthe graphical interface element. In some examples, the user maymanipulate a trackball to highlight a single graphical interface elementas having focus. User-selection of a graphical interface element mayinvoke a pre-defined action by the mobile computing device. In someexamples, selectable graphical interface elements further oralternatively correspond to a button on the keyboard 714. User-selectionof the button may invoke the pre-defined action.

In some examples, the operating system provides a “desktop” graphicaluser interface that is displayed after turning on the mobile computingdevice 710, after activating the mobile computing device 710 from asleep state, after “unlocking” the mobile computing device 710, or afterreceiving user-selection of the “home” button 718 c. The desktopgraphical user interface may display several graphical interfaceelements that, when selected, invoke corresponding application programs.An invoked application program may present a graphical interface thatreplaces the desktop graphical user interface until the applicationprogram terminates or is hidden from view.

User-input may influence an executing sequence of mobile computingdevice 710 operations. For example, a single-action user input (e.g., asingle tap of the touchscreen, swipe across the touchscreen, contactwith a button, or combination of these occurring at a same time) mayinvoke an operation that changes a display of the user interface.Without the user-input, the user interface may not have changed at aparticular time. For example, a multi-touch user input with thetouchscreen 712 may invoke a mapping application to “zoom-in” on alocation, even though the mapping application may have by defaultzoomed-in after several seconds.

The desktop graphical interface can also display “widgets.” A widget isone or more graphical interface elements that are associated with anapplication program that is executing, and that display on the desktopcontent controlled by the executing application program. A widget'sapplication program may launch as the mobile device turns on. Further, awidget may not take focus of the full display. Instead, a widget mayonly “own” a small portion of the desktop, displaying content andreceiving touchscreen user-input within the portion of the desktop.

The mobile computing device 710 may include one or morelocation-identification mechanisms. A location-identification mechanismmay include a collection of hardware and software that provides theoperating system and application programs an estimate of the mobiledevice's geographical position. A location-identification mechanism mayemploy satellite-based positioning techniques, base station transmittingantenna identification, multiple base station triangulation, internetaccess point IP location determinations, inferential identification of auser's position based on search engine queries, and user-suppliedidentification of location (e.g., by receiving user a “check in” to alocation).

The mobile computing device 710 may include other applications,computing sub-systems, and hardware. A call handling unit may receive anindication of an incoming telephone call and provide a user thecapability to answer the incoming telephone call. A media player mayallow a user to listen to music or play movies that are stored in localmemory of the mobile computing device 710. The mobile computing device710 may include a digital camera sensor, and corresponding image andvideo capture and editing software. An internet browser may enable theuser to view content from a web page by typing in an addressescorresponding to the web page or selecting a link to the web page.

The mobile computing device 710 may include an antenna to wirelesslycommunicate information with the base station 740. The base station 740may be one of many base stations in a collection of base stations (e.g.,a mobile telephone cellular network) that enables the mobile computingdevice 710 to maintain communication with a network 750 as the mobilecomputing device is geographically moved. The computing device 710 mayalternatively or additionally communicate with the network 750 through aWi-Fi router or a wired connection (e.g., ETHERNET, USB, or FIREWIRE).The computing device 710 may also wirelessly communicate with othercomputing devices using BLUETOOTH protocols, or may employ an ad-hocwireless network.

A service provider that operates the network of base stations mayconnect the mobile computing device 710 to the network 750 to enablecommunication between the mobile computing device 710 and othercomputing systems that provide services 760. Although the services 760may be provided over different networks (e.g., the service provider'sinternal network, the Public Switched Telephone Network, and theInternet), network 750 is illustrated as a single network. The serviceprovider may operate a server system 752 that routes information packetsand voice data between the mobile computing device 710 and computingsystems associated with the services 760.

The network 750 may connect the mobile computing device 710 to thePublic Switched Telephone Network (PSTN) 762 in order to establish voiceor fax communication between the mobile computing device 710 and anothercomputing device. For example, the service provider server system 752may receive an indication from the PSTN 762 of an incoming call for themobile computing device 710. Conversely, the mobile computing device 710may send a communication to the service provider server system 752initiating a telephone call using a telephone number that is associatedwith a device accessible through the PSTN 762.

The network 750 may connect the mobile computing device 710 with a Voiceover Internet Protocol (VoIP) service 764 that routes voicecommunications over an IP network, as opposed to the PSTN. For example,a user of the mobile computing device 710 may invoke a VoIP applicationand initiate a call using the program. The service provider serversystem 752 may forward voice data from the call to a VoIP service, whichmay route the call over the internet to a corresponding computingdevice, potentially using the PSTN for a final leg of the connection.

An application store 766 may provide a user of the mobile computingdevice 710 the ability to browse a list of remotely stored applicationprograms that the user may download over the network 750 and install onthe mobile computing device 710. The application store 766 may serve asa repository of applications developed by third-party applicationdevelopers. An application program that is installed on the mobilecomputing device 710 may be able to communicate over the network 750with server systems that are designated for the application program. Forexample, a VoIP application program may be downloaded from theApplication Store 766, enabling the user to communicate with the VoIPservice 764.

The mobile computing device 710 may access content on the internet 768through network 750. For example, a user of the mobile computing device710 may invoke a web browser application that requests data from remotecomputing devices that are accessible at designated universal resourcelocations. In various examples, some of the services 760 are accessibleover the internet.

The mobile computing device may communicate with a personal computer770. For example, the personal computer 770 may be the home computer fora user of the mobile computing device 710. Thus, the user may be able tostream media from his personal computer 770. The user may also view thefile structure of his personal computer 770, and transmit selecteddocuments between the computerized devices.

A voice recognition service 772 may receive voice communication datarecorded with the mobile computing device's microphone 722, andtranslate the voice communication into corresponding textual data. Insome examples, the translated text is provided to a search engine as aweb query, and responsive search engine search results are transmittedto the mobile computing device 710.

The mobile computing device 710 may communicate with a social network774. The social network may include numerous members, some of which haveagreed to be related as acquaintances. Application programs on themobile computing device 710 may access the social network 774 toretrieve information based on the acquaintances of the user of themobile computing device. For example, an “address book” applicationprogram may retrieve telephone numbers for the user's acquaintances. Invarious examples, content may be delivered to the mobile computingdevice 710 based on social network distances from the user to othermembers in a social network graph of members and connectingrelationships. For example, advertisement and news article content maybe selected for the user based on a level of interaction with suchcontent by members that are “close” to the user (e.g., members that are“friends” or “friends of friends”).

The mobile computing device 710 may access a personal set of contacts776 through network 750. Each contact may identify an individual andinclude information about that individual (e.g., a phone number, anemail address, and a birthday). Because the set of contacts is hostedremotely to the mobile computing device 710, the user may access andmaintain the contacts 776 across several devices as a common set ofcontacts.

The mobile computing device 710 may access cloud-based applicationprograms 778. Cloud-computing provides application programs (e.g., aword processor or an email program) that are hosted remotely from themobile computing device 710, and may be accessed by the device 710 usinga web browser or a dedicated program. Example cloud-based applicationprograms include GOOGLE DOCS word processor and spreadsheet service,GOOGLE GMAIL webmail service, and PICASA picture manager.

Mapping service 780 can provide the mobile computing device 710 withstreet maps, route planning information, and satellite images. Anexample mapping service is GOOGLE MAPS. The mapping service 780 may alsoreceive queries and return location-specific results. For example, themobile computing device 710 may send an estimated location of the mobilecomputing device and a user-entered query for “pizza places” to themapping service 780. The mapping service 780 may return a street mapwith “markers” superimposed on the map that identify geographicallocations of nearby “pizza places.”

Turn-by-turn service 782 may provide the mobile computing device 710with turn-by-turn directions to a user-supplied destination. Forexample, the turn-by-turn service 782 may stream to device 710 astreet-level view of an estimated location of the device, along withdata for providing audio commands and superimposing arrows that direct auser of the device 710 to the destination.

Various forms of streaming media 784 may be requested by the mobilecomputing device 710. For example, computing device 710 may request astream for a pre-recorded video file, a live television program, or alive radio program. Example services that provide streaming mediainclude YOUTUBE and PANDORA.

A micro-blogging service 786 may receive from the mobile computingdevice 710 a user-input post that does not identify recipients of thepost. The micro-blogging service 786 may disseminate the post to othermembers of the micro-blogging service 786 that agreed to subscribe tothe user.

A search engine 788 may receive user-entered textual or verbal queriesfrom the mobile computing device 710, determine a set ofinternet-accessible documents that are responsive to the query, andprovide to the device 710 information to display a list of searchresults for the responsive documents. In examples where a verbal queryis received, the voice recognition service 772 may translate thereceived audio into a textual query that is sent to the search engine.

These and other services may be implemented in a server system 790. Aserver system may be a combination of hardware and software thatprovides a service or a set of services. For example, a set ofphysically separate and networked computerized devices may operatetogether as a logical server system unit to handle the operationsnecessary to offer a service to hundreds of computing devices. A serversystem is also referred to herein as a computing system.

In various implementations, operations that are performed “in responseto” or “as a consequence of” another operation (e.g., a determination oran identification) are not performed if the prior operation isunsuccessful (e.g., if the determination was not performed). Operationsthat are performed “automatically” are operations that are performedwithout user intervention (e.g., intervening user input). Features inthis document that are described with conditional language may describeimplementations that are optional. In some examples, “transmitting” froma first device to a second device includes the first device placing datainto a network for receipt by the second device, but may not include thesecond device receiving the data. Conversely, “receiving” from a firstdevice may include receiving the data from a network, but may notinclude the first device transmitting the data.

“Determining” by a computing system can include the computing systemrequesting that another device perform the determination and supply theresults to the computing system. Moreover, “displaying” or “presenting”by a computing system can include the computing system sending data forcausing another device to display or present the referenced information.

FIG. 8 is a block diagram of computing devices 800, 850 that may be usedto implement the systems and methods described in this document, aseither a client or as a server or plurality of servers. Computing device800 is intended to represent various forms of digital computers, such aslaptops, desktops, workstations, personal digital assistants, servers,blade servers, mainframes, and other appropriate computers. Computingdevice 850 is intended to represent various forms of mobile devices,such as personal digital assistants, cellular telephones, smartphones,and other similar computing devices. The components shown here, theirconnections and relationships, and their functions, are meant to beexamples only, and are not meant to limit implementations describedand/or claimed in this document.

Computing device 800 includes a processor 802, memory 804, a storagedevice 806, a high-speed controller 808 connecting to memory 804 andhigh-speed expansion ports 810, and a low speed controller 812connecting to low speed expansion port 814 and storage device 806. Eachof the components 802, 804, 806, 808, 810, and 812, are interconnectedusing various busses, and may be mounted on a common motherboard or inother manners as appropriate. The processor 802 can process instructionsfor execution within the computing device 800, including instructionsstored in the memory 804 or on the storage device 806 to displaygraphical information for a GUI on an external input/output device, suchas display 816 coupled to high-speed controller 808. In otherimplementations, multiple processors and/or multiple buses may be used,as appropriate, along with multiple memories and types of memory. Also,multiple computing devices 800 may be connected, with each deviceproviding portions of the necessary operations (e.g., as a server bank,a group of blade servers, or a multi-processor system).

The memory 804 stores information within the computing device 800. Inone implementation, the memory 804 is a volatile memory unit or units.In another implementation, the memory 804 is a non-volatile memory unitor units. The memory 804 may also be another form of computer-readablemedium, such as a magnetic or optical disk.

The storage device 806 is capable of providing mass storage for thecomputing device 800. In one implementation, the storage device 806 maybe or contain a computer-readable medium, such as a floppy disk device,a hard disk device, an optical disk device, or a tape device, a flashmemory or other similar solid state memory device, or an array ofdevices, including devices in a storage area network or otherconfigurations. A computer program product can be tangibly embodied inan information carrier. The computer program product may also containinstructions that, when executed, perform one or more methods, such asthose described above. The information carrier is a computer- ormachine-readable medium, such as the memory 804, the storage device 806,or memory on processor 802.

The high-speed controller 808 manages bandwidth-intensive operations forthe computing device 800, while the low speed controller 812 manageslower bandwidth-intensive operations. Such allocation of functions is anexample only. In one implementation, the high-speed controller 808 iscoupled to memory 804, display 816 (e.g., through a graphics processoror accelerator), and to high-speed expansion ports 810, which may acceptvarious expansion cards (not shown). In the implementation, low-speedcontroller 812 is coupled to storage device 806 and low-speed expansionport 814. The low-speed expansion port, which may include variouscommunication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet)may be coupled to one or more input/output devices, such as a keyboard,a pointing device, a scanner, or a networking device such as a switch orrouter, e.g., through a network adapter.

The computing device 800 may be implemented in a number of differentforms, as shown in the figure. For example, it may be implemented as astandard server 820, or multiple times in a group of such servers. Itmay also be implemented as part of a rack server system 824. Inaddition, it may be implemented in a personal computer such as a laptopcomputer 822. Alternatively, components from computing device 800 may becombined with other components in a mobile device (not shown), such asdevice 850. Each of such devices may contain one or more of computingdevice 800, 850, and an entire system may be made up of multiplecomputing devices 800, 850 communicating with each other.

Computing device 850 includes a processor 852, memory 864, aninput/output device such as a display 854, a communication interface866, and a transceiver 868, among other components. The device 850 mayalso be provided with a storage device, such as a microdrive or otherdevice, to provide additional storage. Each of the components 850, 852,864, 854, 866, and 868, are interconnected using various buses, andseveral of the components may be mounted on a common motherboard or inother manners as appropriate.

The processor 852 can execute instructions within the computing device850, including instructions stored in the memory 864. The processor maybe implemented as a chipset of chips that include separate and multipleanalog and digital processors. Additionally, the processor may beimplemented using any of a number of architectures. For example, theprocessor may be a CISC (Complex Instruction Set Computers) processor, aRISC (Reduced Instruction Set Computer) processor, or a MISC (MinimalInstruction Set Computer) processor. The processor may provide, forexample, for coordination of the other components of the device 850,such as control of user interfaces, applications run by device 850, andwireless communication by device 850.

Processor 852 may communicate with a user through control interface 858and display interface 856 coupled to a display 854. The display 854 maybe, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display)display or an OLED (Organic Light Emitting Diode) display, or otherappropriate display technology. The display interface 856 may compriseappropriate circuitry for driving the display 854 to present graphicaland other information to a user. The control interface 858 may receivecommands from a user and convert them for submission to the processor852. In addition, an external interface 862 may be provide incommunication with processor 852, so as to enable near areacommunication of device 850 with other devices. External interface 862may provided, for example, for wired communication in someimplementations, or for wireless communication in other implementations,and multiple interfaces may also be used.

The memory 864 stores information within the computing device 850. Thememory 864 can be implemented as one or more of a computer-readablemedium or media, a volatile memory unit or units, or a non-volatilememory unit or units. Expansion memory 874 may also be provided andconnected to device 850 through expansion interface 872, which mayinclude, for example, a SIMM (Single In Line Memory Module) cardinterface. Such expansion memory 874 may provide extra storage space fordevice 850, or may also store applications or other information fordevice 850. Specifically, expansion memory 874 may include instructionsto carry out or supplement the processes described above, and mayinclude secure information also. Thus, for example, expansion memory 874may be provide as a security module for device 850, and may beprogrammed with instructions that permit secure use of device 850. Inaddition, secure applications may be provided via the SIMM cards, alongwith additional information, such as placing identifying information onthe SIMM card in a non-hackable manner.

The memory may include, for example, flash memory and/or NVRAM memory,as discussed below. In one implementation, a computer program product istangibly embodied in an information carrier. The computer programproduct contains instructions that, when executed, perform one or moremethods, such as those described above. The information carrier is acomputer- or machine-readable medium, such as the memory 864, expansionmemory 874, or memory on processor 852 that may be received, forexample, over transceiver 868 or external interface 862.

Device 850 may communicate wirelessly through communication interface866, which may include digital signal processing circuitry wherenecessary. Communication interface 866 may provide for communicationsunder various modes or protocols, such as GSM voice calls, SMS, EMS, orMMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others.Such communication may occur, for example, through radio-frequencytransceiver 868. In addition, short-range communication may occur, suchas using a Bluetooth, WiFi, or other such transceiver (not shown). Inaddition, GPS (Global Positioning System) receiver module 870 mayprovide additional navigation- and location-related wireless data todevice 850, which may be used as appropriate by applications running ondevice 850.

Device 850 may also communicate audibly using audio codec 860, which mayreceive spoken information from a user and convert it to usable digitalinformation. Audio codec 860 may likewise generate audible sound for auser, such as through a speaker, e.g., in a handset of device 850. Suchsound may include sound from voice telephone calls, may include recordedsound (e.g., voice messages, music files, etc.) and may also includesound generated by applications operating on device 850.

The computing device 850 may be implemented in a number of differentforms, as shown in the figure. For example, it may be implemented as acellular telephone 880. It may also be implemented as part of asmartphone 882, personal digital assistant, or other similar mobiledevice.

Additionally computing device 800 or 850 can include Universal SerialBus (USB) flash drives. The USB flash drives may store operating systemsand other applications. The USB flash drives can include input/outputcomponents, such as a wireless transmitter or USB connector that may beinserted into a USB port of another computing device.

Various implementations of the systems and techniques described here canbe realized in digital electronic circuitry, integrated circuitry,specially designed ASICs (application specific integrated circuits),computer hardware, firmware, software, and/or combinations thereof.These various implementations can include implementation in one or morecomputer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichmay be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device.

These computer programs (also known as programs, software, softwareapplications or code) include machine instructions for a programmableprocessor, and can be implemented in a high-level procedural and/orobject-oriented programming language, and/or in assembly/machinelanguage. As used herein, the terms “machine-readable medium”“computer-readable medium” refers to any computer program product,apparatus and/or device (e.g., magnetic discs, optical disks, memory,Programmable Logic Devices (PLDs)) used to provide machine instructionsand/or data to a programmable processor, including a machine-readablemedium that receives machine instructions as a machine-readable signal.The term “machine-readable signal” refers to any signal used to providemachine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniquesdescribed here can be implemented on a computer having a display device(e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor)for displaying information to the user and a keyboard and a pointingdevice (e.g., a mouse or a trackball) by which the user can provideinput to the computer. Other kinds of devices can be used to provide forinteraction with a user as well; for example, feedback provided to theuser can be any form of sensory feedback (e.g., visual feedback,auditory feedback, or tactile feedback); and input from the user can bereceived in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in acomputing system that includes a back end component (e.g., as a dataserver), or that includes a middleware component (e.g., an applicationserver), or that includes a front end component (e.g., a client computerhaving a graphical user interface or a Web browser through which a usercan interact with an implementation of the systems and techniquesdescribed here), or any combination of such back end, middleware, orfront end components. The components of the system can be interconnectedby any form or medium of digital data communication (e.g., acommunication network). Examples of communication networks include alocal area network (“LAN”), a wide area network (“WAN”), peer-to-peernetworks (having ad-hoc or static members), grid computinginfrastructures, and the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

Although a few implementations have been described in detail above,other modifications are possible. Moreover, other mechanisms forperforming the systems and methods described in this document may beused. In addition, the logic flows depicted in the figures do notrequire the particular order shown, or sequential order, to achievedesirable results. Other steps may be provided, or steps may beeliminated, from the described flows, and other components may be addedto, or removed from, the described systems. Accordingly, otherimplementations are within the scope of the following claims.

1. A computer-implemented method, comprising: receiving, by a computingsystem, a request to normalize an input Uniform Resource Locator (URL);identifying, by the computing system, a user account to which therequest to normalize the input URL relates; determining, by thecomputing system, a selected set of URL normalization rules that areidentified as being activated for the user account from among a largercollection of URL normalization rules; normalizing, by the computingsystem, the input URL using the selected set of URL normalization rulesto generate a normalized URL; and storing, by the computing system,information that results from an analysis of the input URL inassociation with an indication of the normalized URL.
 2. Thecomputer-implemented method of claim 1, comprising: receiving, by thecomputing system, a second request to normalize a second input URL, thesecond input URL being different from the input URL; identifying, by thecomputing system, that the second request to normalize the second inputURL relates to same said user account to which the request to normalizethe input URL relates; normalizing, by the computing system, the secondinput URL using the selected set of URL normalization rules to generatesame said normalized URL, based on the selected set of URL normalizationrules being activated for the user account; and storing, by thecomputing system, second information that results from an analysis ofthe second input URL in association with the indication of thenormalized URL.
 3. The computer-implemented method of claim 2,comprising: receiving, by the computing system, a request forinformation stored in association with the indication of the normalizedURL; and accessing, by the computing system, the information and thesecond information using the indication of the normalized URL to accessthe information and the second information, without the accessing usingthe input URL and without the accessing using the second input URL. 4.The computer-implemented method of claim 1, comprising: providing, bythe computing system for receipt by a computing device at which the useraccount has logged in, information to cause the computing device topresent a user interface to activate normalization rules, the userinterface presenting content that indicates multiple URL normalizationrules from the collection of URL normalization rules along withcorresponding activation interface elements that enable user input toselectively activate selected URL normalization rules of the multipleURL normalization rules that are indicated by the user interface; andreceiving, by the computing system, an indication that user input at thecomputing device interacted with activation interface elements presentedby the user interface to activate the selected set of URL normalizationrules for the user account.
 5. The computer-implemented method of claim4, wherein: the multiple URL normalization rules represent thecollection of URL normalization rules, such that the user interfacepresents content that indicates all URL normalization rules of thecollection of URL normalization rules.
 6. (canceled)
 7. Thecomputer-implemented method of claim 4, wherein the indication that userinput at the computing device that interacted with the activationinterface elements to activate the selected set of URL normalizationrules includes: an indication that a first user input that interactedwith a first activation interface element to activate a first URLnormalization rule from the collection of URL normalization rules; andan indication that a second user input that interacted with a secondactivation interface element to activate a second URL normalization rulefrom the collection of URL normalization rules.
 8. Thecomputer-implemented method of claim 7, wherein: the first user inputthat activated the first URL normalization rule included a first userinteraction with the first activation element without entry ofuser-specified text, the first URL normalization rule not beingcustomizable by user input through interaction with the user interface.9. The computer-implemented method of claim 8, wherein: the second userinput that activated the second URL normalization rule included a seconduser interaction with the second activation element to enteruser-specified text, the user-specified text indicating text content inURLs on which the second URL normalization rule is to operate.
 10. Thecomputer-implemented method of claim 9, wherein: the second URLnormalization rule is configured to modify the user-specified text in amanner that is defined by the second URL normalization rule and notdefined by the user-specified text, the manner that the second URLnormalization rule is configured to modify the user-specified text notbeing customizable by user input through interaction with the userinterface.
 11. The computer-implemented method of claim 1, comprising:receiving, by the computing system, a second request to normalize asecond input URL; identifying, by the computing system, a second useraccount to which the second request to normalize the second input URLrelates; determining, by the computing system, a second selected set ofURL normalization rules that are identified as being activated for thesecond user account from among the larger collection of URLnormalization rules, the second selected set of URL rules that areactivated for the second user account being different from the selectedset of URL normalization rules that are activated for the user account;normalizing, by the computing system, the second input URL using thesecond selected set of URL normalization rules to generate a secondnormalized URL, the second normalized URL being different from thenormalized URL; and storing, by the computing system, information thatresults from an analysis of the second input URL in association with anindication of the second normalized URL.
 12. The computer-implementedmethod of claim 1, wherein: identifying the user account to which therequest to normalize the input URL relates includes identifying that therequest to normalize the input URL specifies the user account.
 13. Thecomputer-implemented method of claim 1, comprising: hashing, by thecomputing system, the normalized URL to identify a hashed value thatrepresents the normalized URL, the indication of the normalized URLcomprising the hashed value.
 14. The computer-implemented method ofclaim 13, comprising: receiving, by the computing system, a request forinformation that results from a URL analysis, the request forinformation that results from the URL analysis including the hashedvalue; and accessing, by the computing system using the hashed value andwithout using the input URL responsive to receiving the hashed value inthe request for information that results from the URL analysis, theinformation that results from the analysis of the input URL.
 15. Thecomputer-implemented method of claim 14, comprising: accessing, by thecomputing system using the hashed value and without using the input URLresponsive to receiving the hashed value in the request for informationthat results from the URL analysis, information that results from ananalysis of a second URL, wherein the normalized URL represents anormalized version of the second URL when the second URL is normalizedaccording to the selected set of normalization rules.
 16. Thecomputer-implemented method of claim 1, comprising: providing, by thecomputing system for receipt by a computing device at which the useraccount has logged in, information to cause the computing device topresent a user interface to activate normalization rules, the userinterface presenting activation interface elements that enable userinput to (i) select a selected group of URLs from among multipledifferent groups of URLs, and (ii) select a portion of the collection ofURL normalization rules to apply to the selected group of URLs;receiving, by the computing system, user input that interacts with theactivation interface elements of the user interface to specify that theselected set of URL normalization rules are to apply to URLs in a firstgroup of URLs from among the multiple different groups of URLs; andidentifying, by the computing system, that the input URL is part of thefirst group of URLs, wherein determining the selected set of URLnormalization rules that are identified as being activated for the useraccount includes identifying that the selected set of URL normalizationrules were specified by user input as applying to the first group ofURLs and that the input URL is part of the first group of URLs.
 17. Thecomputer-implemented method of claim 16, comprising: receiving, by thecomputing system, user input that interacts with one or more of theactivation interface elements presented as part of the user interface tospecify that a second selected set of URL normalization rules are toapply to URLs in a second group of URLs from among the multipledifferent groups of URLs, wherein the second group of URLs is differentfrom the first group of URLs, wherein the second selected set of URLnormalization rules are different from the selected set of URLnormalization rules.
 18. The computer-implemented method of claim 17,comprising: receiving, by a computing system, a request to normalize asecond input URL; identifying, by the computing system, that the secondinput URL is part of the second group of URLs; normalizing, by thecomputing system, the second input URL using the second selected set ofURL normalization rules to generate a second normalized URL, wherein thesecond selected set of normalization rules includes a normalization rulethat is not within the selected set of normalization rules; hashing, bythe computing system, the second normalized URL using a combination ofall URL rules from the selected set of URL normalization rules and allURL rules from the second selected set of URL normalization rules,including the normalization rule that is not within the second selectedset of normalization rules and excluding multiple normalization rulesfrom the collection of URL normalization rules that are not activatedfor the user account, to identify a second hashed value that identifiesthe second normalized URL; and storing, by the computing system,information that results from an analysis of the second input URL inassociation with the second hashed value.
 19. A computing system,comprising: one or more processors; and one or more computer-readabledevices including instructions that, when executed by the one or moreprocessors, cause the computing system to perform operations thatinclude: receiving, by the computing system, a request to normalize aninput Uniform Resource Locator (URL); identifying, by the computingsystem, a user account to which the request to normalize the input URLrelates; determining, by the computing system, a selected set of URLnormalization rules that are identified as being activated for the useraccount from among a larger collection of URL normalization rules;normalizing, by the computing system, the input URL using the selectedset of URL normalization rules to generate a normalized URL; andstoring, by the computing system, information that results from ananalysis of the input URL in association with an indication of thenormalized URL.